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CLAIMS 

is claimed is: 

A method for detOTnining whether records are similar in a database containing 
both structured and unstructured, free-text data, the method comprising the steps 
of: 

accessing two of the records from the database for evaluation; and 
evaluating a match between the two records as a weighted match between 
each of a plurality of available fields, such that a matching process is selected as 
appropriate from among a group of matching processes including strict Boolean, 
ordinal, and vector-based matching processes, wherein: 

when a strict Boolean matching process is selected, applying a 
match function as an exact match test; 

when an ordinal matching process is selected, applying a match 
function that makes use of information concerning the size and ordering of 
the data domain; and 

when a vector-based matching process is selected applying a match 
function that uses a vector space frequency test. 

The method of claim 1 wherein the step of evaluating a match between the two 
records comprises applying the matching process to determine a match score for 
two corresponding fields of the plurahty of available fields, the two corresponding 
fields selected from corresponding locations in each of the two records. 

The method of claim 1 wherein the step of evaluating a match between the two 
records comprises selecting the matching process based on a common data type 
shared by both of two fields of the plurality of available fields accessed in the two 
records. 
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The method of claim 3 wherein when a Boolean matching process is selected, the 
data type of both of the two fields specifies nominal data. 

The method of claim 3 wherein when an ordinal matching process is selected, the 
data type of both of the two fields specifies data capable of being ordered. 

The method of claim 3 wherein, when a vector-based matching process is 
selected, the data type of both of the two fields specifies text data. 

The method of claim 1 wherein the step of evaluating the match between the two 
records comprises calculating a similarity score between the two records, as 
follows: 

sim(record/, recordy) = wi*match(ai/,aiy) + W2*match(a2/,a2/) + . . . 
Wn*match(a«,-,a„;) 

wherein sim is a similarity fimction that determines the similarity 
score for the two records; 

record, is a first record of the two records and is identified in the 
database by an iterator i; 

recordy is a second record of the two records and is identified in 
the database by an iterator j; 

iterator n identifies a field position for a given field a^/ in the 
record^ and a corresponding field position for a given field a^y in the 
record^; 

match indicates the match fimction; and 
a symbol w« indicates a predefined weight for each result of each 
match fimction. 

The method of claim 1 wherein the database is a relational database, the records 

are tuples, and the fields are attributes. 
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A data processing system for determining whether records are similar in a 
database containing both structured and unstructured, free-text data, the data 
processing system comprising: 

a communications interface for communicating with the database; and 
a processor coupled to the commimications interface, the processor 
hosting and executing a data evaluation application that is configured to: 

access two of the records from the database for evaluation; and 
evaluate a match between the two records as a weighted match 
between each of a pluraUty of available fields, such that a matching 
process is selected as appropriate from among a group of matching 
processes including strict Boolean, ordinal, and vector-based matching 
processes, wherein: 

when a strict Boolean matching process is selected, apply a 
match function as an exact match test; 

when an ordinal matching process is selected, apply a 
match function that makes use of information concerning the size 
and ordering of the data domain; and 

when a vector-based matching process is selected, apply a 
match function that uses a vector space frequency test. 

The data processing system of claim 9 wherein the data evaluation application is 
configured to apply the matching process to determine a match score for two 
corresponding fields of the plurality of available fields, the two corresponding 
fields selected from corresponding locations in each of the two records. 

The data processing system of claim 9 wherein the data evaluation appHcation is 
configured to select the matching process based on a common data type shared by 
both of two fields of the plurality of available fields accessed in the two records. 
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The data processing system of claim 1 1 wherein when the data evaluation 
application selects a Boolean matching process, the data type of both of the two 
fields specifies nominal data. 

The data processing system of claim 1 1 wherein when the data evaluation 
application selects an ordinal matching process, the data type of both of the two 
fields specifies data capable of being ordered. 

The data processing system of claim 1 1 wherein, when the data evaluation 
application selects a vector-based matching process, the data type of both of the 
two fields specifies text data. 

The data processing system of claim 9 wherein the data evaluation application is 
configured to calculate a similarity score between the two records, as follows: 
sim(record/, record;) = wi*match(ai/,aiy) + W2*match(a2i,a2/) + . . . 
w„*match(a„/,a;y) 

wherein sim is a similarity fimction that determines the similarity 
score for the two records; 

recordi is a first record of the two records and is identified in the 
database by an iterator i; 

recordy is a second record of the two records and is identified in 
the database by an iterator j; 

iterator n identifies a field position for a given field a„/ in the 
record^ and a corresponding field position for a given field a„j in the 
record;; 

match indicates the match fimction; and 

a symbol w„ indicates a predefined weight for each result of each 
match fimction. 
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16. The data processing system of claim 9 wherein the database is a relational 
database, the records are tuples, and the fields are attributes. 
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